Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

نویسندگان

  • Bogdan Vlasenko
  • Dmytro Prylipko
  • Ronald Böck
  • Andreas Wendemuth
چکیده

The role of automatic emotion recognition from speech is growing continuously because of the accepted importance of reacting o the emotional state of the user in human–computer interaction. Most state-of-the-art emotion recognition methods are based on urnand frame-level analysis independent from phonetic transcription. Here, we are interested in a phoneme-based classification f the level of arousal in acted and spontaneous emotions. To start, we show that our previously published classification technique hich showed high-level results in the Interspeech 2009 Emotion Challenge cannot provide sufficiently good classification in crossorpora evaluation (a condition close to real-life applications). To prove the robustness of our emotion classification techniques we se cross-corpora evaluation for a simplified two-class problem; namely high and low arousal emotions. We use emotion classes on a honeme-level for classification. We build our speaker-independent emotion classifier with HMMs, using GMMs-based production robabilities and MFCC features. This classifier performs equally well when using a complete phoneme set, as it does in the case of a educed set of indicative vowels (7 out of 39 phonemes in the German SAM-PA list). Afterwards we compare emotion classification erformance of the technique used in the Emotion Challenge with phoneme-based classification within the same experimental setup. ith phoneme-level emotion classes we increase cross-corpora classification performance by about 3.15% absolute (4.69% relative) or models trained on acted emotions (EMO-DB dataset) and evaluated on spontaneous emotions (VAM dataset); within vice versa xperimental conditions (trained on VAM, tested on EMO-DB) we obtain 15.43% absolute (23.20% relative) improvement. We show hat using phoneme-level emotion classes can improve classification performance even with comparably low speech recognition erformance obtained with scant a priori knowledge about the language, implemented as a zero-gram for word-level modeling and bi-gram for phoneme-level modeling. Finally we compare our results with the state-of-the-art cross-corpora evaluations on the AM database. For training our models, we use an almost 15 times smaller training set, consisting of 456 utterances (210 low and 46 high arousal emotions) instead of 6820 utterances (4685 high and 2135 low arousal emotions). We are yet able to increase ross-corpora classification performance by about 2.25% absolute (3.22% relative) from UA = 69.7% obtained by Zhang et al. to A = 71.95%. rown Copyright © 2012 Published by Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Real-Time Electroencephalography Classification in Emotion Assessment Based on Synthetic Statistical-Frequency Feature Extraction and Feature Selection

Purpose: To assess three main emotions (happy, sad and calm) by various classifiers, using appropriate feature extraction and feature selection. Materials and Methods: In this study a combination of Power Spectral Density and a series of statistical features are proposed as statistical-frequency features. Next, a feature selection method from pattern recognition (PR) Tools is presented to e...

متن کامل

Toward the Development of a Robust Kinetic Model for the Cobalt Fischer-Tropsch Catalyst Lifetime Using a Novel Sigmoidal Pattern

Although catalyst deactivation rate greatly varies depending on many factors, including the catalyst structure, reactor feed composition, and operating conditions; it is usually inevitable. Since catalyst deactivation modeling has so far been poorly addressed in the literature, in the present study, nine experimental sets of cobalt based Fischer-Tropsch catalysts activity-time data were conside...

متن کامل

Classifier Ensemble Framework: a Diversity Based Approach

Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...

متن کامل

Saturated Neural Adaptive Robust Output Feedback Control of Robot Manipulators:An Experimental Comparative Study

In this study, an observer-based tracking controller is proposed and evaluatedexperimentally to solve the trajectory tracking problem of robotic manipulators with the torque saturationin the presence of model uncertainties and external disturbances. In comparison with the state-of-the-artobserver-based controllers in the literature, this paper introduces a saturated observer-based controllerbas...

متن کامل

Skew-slash distribution and its application in topics regression

In many issues of statistical modeling, the common assumption is that observations are normally distributed. In many real data applications, however, the true distribution is deviated from the normal. Thus, the main concern of most recent studies on analyzing data is to construct and the use of alternative distributions. In this regard, new classes of distributions such as slash and skew-sla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2014